Fast alignment of fragmentation trees

نویسندگان

  • Franziska Hufsky
  • Kai Dührkop
  • Florian Rasche
  • Markus Chimani
  • Sebastian Böcker
چکیده

MOTIVATION Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of 'unknown' small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns. RESULTS Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: a dynamic programming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for 'challenging' instances. Running times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1% slowest alignments required as much time as computing the 99% fastest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Maximum Colorful Subtrees in Practice

In metabolomics and other fields dealing with small compounds, mass spectrometry is applied as a sensitive high-throughput technique. Recently, fragmentation trees have been proposed to automatically analyze the fragmentation mass spectra recorded by such instruments. Computationally, this leads to the problem of finding a maximum weight subtree in an edge-weighted and vertex-colored graph, suc...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Efficient Querying on Genomic Databases by Using Metric Space Indexing Techniques

A genomic database consists of a set of nucleotide sequences, for which an important kind of queries is the local sequence alignment. This paper investigates two different indexing techniques, namely the variations of GNAT trees [1] and M-trees [3], to support fast query evaluation for local alignment, by transforming the alignment problem to a variant metric space neighborhood search problem.

متن کامل

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

We present a fast algorithm for optimal alignment between two similar ordered trees with node labels. Let S and T be two such trees with |S| and |T | nodes, respectively. An optimal alignment between S and T which uses at most d blank symbols can be constructed in O(n log n · (maxdeg) · d) time, where n = max{|S|, |T |} and maxdeg is the maximum degree of a node in S or T . In particular, if th...

متن کامل

QAlign: quality-based multiple alignments with dynamic phylogenetic analysis.

Integrating different alignment strategies, a layout editor and tools deriving phylogenetic trees in a 'multiple alignment environment' helps to investigate and enhance results of multiple sequence alignment by hand. QAlign combines algorithms for fast progressive and accurate simultaneous multiple alignment with a versatile editor and a dynamic phylogenetic analysis in a convenient graphical u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2012